home *** CD-ROM | disk | FTP | other *** search
- 1
-
- UASM.DOC
-
- UASM (for Unassembler) consists of five files at this
- time: UASM.DOC, UASM-JMP.BAS, UASM-INT.BAS, UASM-STR.BAS and
- UASM-DOS.MAC, with the purpose of converting the unassembled
- listing of a .COM file from DEBUG into a .ASM file which can
- be modified and re-assembled with the Macro assembler.
-
- **************************** NOTICE ****************************
-
- USER SUPPORTED SOFTWARE (With thanks to Andrew Flugelman)
-
- A limited license is granted to all users of these programs,
- to make and distribute copies for other users subject to the
- following conditions:
-
- 1. None of the notices or credits are to be bypassed,
- altered, or removed.
- 2. The programs are not to be distributed in modified
- form. (Users are encouraged to distribute MERGE
- files.)
- 3. No fee is to be charged (or any other consideration
- received) for copying or distributing the programs
- without an express written agreement with White Crane
- Systems.
- ***************************************************************
-
- UASM - The White Crane Systems Unassembler
-
- If you are using these program and finding them of value
- please send a cash contribution to support their upkeep and
- distribution. Use the UASM system of programs to unassemble
- one average length .COM file, look over the results and calcu-
- late how many hours this would have taken you to produce.
- Multiply this by the minimum wage, contribute that amount,
- and use the program free thereafter. If that's too much just
- send $20. Supporters will receive free notice of enhancements
- and updates.
- In any case you are encouraged to copy and distribute
- UASM to your friends provided you do so free of charge and
- in unmodified form.
-
- Guy C. Gordon
- White Crane Systems
- 3194 Friar Tuck Way
- Doraville, GA 30340
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- 2
-
- INTRODUCTION
-
- The strategy used in this system is to capture the output
- of DEBUG and run it through a series of BASIC programs, each
- of which modifies one type of statement in the listing, making
- it more like an .ASM source file. This keeps each program
- short and fast, and allows you to look over the output at each
- step to make sure no mistakes have been entered. It also makes
- the programs easy to understand and improve, as new steps can
- be added without interfering with the first steps. Later in
- its development UAand improve, as new steps can
- be added without interfering with the first steps. Later in
- its development UASM will combine these steps. I hope that
- users of these programs will send me their improvements so
- that I may add them to future releases.
-
- UASM-JMP takes captured unassembled code from DEBUG (which
- we will name FILE.DB) and finds all addresses referenced by
- the various Jump, Call, and Loop instructions. These referenced
- addresses are made into labels of the form Lhhhh (where hhhh
- is the hex address). A new file (FILE.JMP) is then written
- in the form of assembler source code. All of the addresses
- and hex opcodes in the left two columns of the DEBUG listing
- are left out. Referenced lines are appropriately labeled as
- Lhhhh:.
-
- UASM-INT reads FILE.JMP and writes FILE.INT in which it
- has added Macro calls and comments explaining the various Inter-
- rupts. The macros, symbols, and comments are read from the
- file UASM-DOS.MAC. This file contains a table of EQUates which
- define the symbols for the various DOS function calls and the
- DOSCALL macro. It is included in FILE.INT by means of an
- INCLUDE directive.
-
- UASM-STR reads FILE.INT and writes FILE.STR, attempting
- to find all strings and variables used by FILE.COM. When it
- finds an address it reads the string or variable from FILE.COM
- and generates the appropriate data statement (e.g. Dhhhh DB
- 'string') which it appends to FILE.STR, and comments each
- line of code which references that address.
-
- From that point on, you must take over and supply the
- remaining text strings and variables that are addressed. You
- should heavily comment the code as you go through it and change
- the labels that UASM has assigned into more meaningful names.
- This is best done with the global change command in your text
- editor. I also recommend using the Macro CREF program to obtain
- a cross reference map of the symbols.
-
- These programs are by no means infallible, and they can
- no more read the programmers' mind than you or I, so you will
- have to check the output closely. If you expect to simply
- run UASM and be handed a usable source file you're going to
- be disappointed. On the other hand, if you've ever tried to
-
-
-
-
-
-
-
-
-
-
-
-
- 3
-
- understand a program from just a DEBUG listing you will be
- pleasantly surprised. UASM will aid you in studying other
- programs by doing a lot of the dirty work for you, but if you
- don't study the code you won't get usable output.
-
- I have been using these programs to unassemble DEBUG.COM
- and COMMAND.COM. When I have them sufficiently commented I
- will post them on the BBS's. It is my hope that UASM will
- lead to a whole library of well commented, "reverse engineered"
- source code for the MS-DOS operating system and utilities.
- I would appreciate anyone else working on the same to upload
- your results to the BBS. Suggestions and improvements to UASM
- are welcome. I may be contacted through any of the IBM-PC
- BBS's in Atlanta, or write:
-
- Guy C. Gordon
- White Crane Systems
- 3194 Friar Tuck Way
- Doraville, GA 30340
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- 4
-
- OPERATING INSTRUCTIONS
- -DEBUG-
-
- As an example, we will unassemble a fictitious file,
- FILE.COM
- A>debug file.com
- -r
- .....CX=1780 ... ;file length in hex bytes
- -d 100 l 1780 ;display entire file
-
- In the listing that follows you should be able to spot
- ASCII text and any regular binary tables. Write down the be-
- ginning and ending addresses of these, as we do not want to
- unassemble them, but we will want a printed copy. Our aim
- is to put together a list of all blocks of code to be unassem-
- bled and string addresses for UASM-STR. Look at the code before
- each block of text. Usually it will be preceded by a hex C3
- which is a RET instruction, but there may be a JMP, JMPS, IRET,
- or RETF instead. This is the last instruction we want to unas-
- semble in the block of code preceding the text. Take your
- time and go through the entire file, unassembling code and
- making sure that the output looks reasonable.
-
- Reasonable code contains such things as CALL or Jump in-
- structions to nearby addresses, INT 21 instructions and multiple
- operations on single registers. It does not contain DB instruc-
- tions or very many 00 bytes. Also the ASCII display of a sec-
- tion of code will look totally random, with about 50% of it
- being displayable characters. (The rest will be periods.)
- Peter Norton has given a good demonstration of this in chapter
- 6 of "Inside the IBM-PC". One warning--the DEBUG unassembler
- tends to lock into phase with the correct code, which is very
- nice, but be certain that the beginning few instructions are
- also in phase. Sections of code that are in phase will contain
- Jumps and CALLs to other sections, thus telling you where to
- start unassembling.
-
- At the end of this investigation of the .COM file you
- should have a list of the starting and ending addresses of
- all the code blocks and all the string blocks. The next step
- depends upon whether you have DOS 2.0 or not. It is much easier
- if you have 2.0, or can to this part on a friend's machine
- who has it. This is because under DOS 2.0 we can pipe the
- output of DEBUG into a file thus capturing the unassembled
- code for input to UASM-JMP. Under DOS version 1. we must modify
- DEBUG (using DEBUG of course) to get it to write the file we
- need.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- 5
-
- DEBUG - 2.0 Instructions
-
- Create a file, FILE.IN, with the following DEBUG instruc-
- tions:
-
- u addr 1 addr 2 ;addresses of blocks of
- u addr 3 addr 4 ; code to unassemble
- u addr 5 addr 6 ; from our initial investigation
- q ;Must have Quit instruction at end
-
- Now we can run DEBUG and pipe the output to a disk file.
-
- DEBUG FILE.COM <FILE.IN >FILE.DB
-
- FILE.DB is the input for UASM-JMP.
-
-
-
-
- DEBUG - 1.1 Instructions
-
- While it is quite easy to capture the output of DEBUG under
- DOS 2.0 since we can pipe it to a file, under earlier versions
- of DOS we have no such option. However, DEBUG is an exceptional-
- ly powerful program, and already contains the code necessary
- to write a disk file with the Write command. We will use this
- to capture the Unassembled code.
-
- If we unassemble and examine DEBUG, we can find the follow-
- ing subroutine:
-
- 02C8:02C0 PUSH AX ;save registers
- PUSH DX
- AND AL,7F ;insure character is ASCII
- XCHG DX,AX ;put character in DL
- MOV AH,02 ;DOS Function 2 to display DL
- INT 21
- POP DX ;restore registers
- POP AX
- RET ;return
-
- As it turns out, DEBUG does all screen output through
- this subroutine. Thus we can modify just this subroutine and
- capture each character as it is displayed. What we will do
- with it is write it out to an unused portion of memory. From
- there we can write all the output to a file using the Write
- command.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- 6
-
- Our subroutine to store character AL in consecutive memory
- locations will be very small--about 20 bytes. We'll need some-
- place to put it. For DEBUG 1.07 I chose to put it inside a
- string which is only printed once--the message "DEBUG version
- 1.07" located at CS:0102. Here is the subroutine:
-
- 02C8:0102 DW 3300 ;pointer to memory
- PUSH DI ;save index register
- SEG CS ;offset from code, not ES
- MOV DI,[0102] ;get pointer
- SEG CS ;
- STOSB ;store char in AL into memory
- SEG CS ;
- MOV [0102],DI ;store incremented pointer
- POP DI ;restore register
- XCHG DX,AX ;complete the instructions that
- MOV AH,02 ; CALL to this routine replaced
- RET ;Return to Display routine
-
- We can store this subroutine over the string with the
- Enter command. (here 02C8 is the base segment where DEBUG is
- loaded on my system):
-
- E 2C8:102 00 33 57 2E 8B 3E 02 01 2E AA 2E 89 3E 02 01 5F 92
- B4 02 C3
-
- We can check that this was entered correctly by Unassembling
- it:
-
- U 2C8:104 ;you should see the subroutine listed above.
-
- The choice of memory location is up to you. 3300 Is the
- value I used while unassembling DEBUG. It should be larger
- than the sum of the sizes (in bytes) of DEBUG and the program
- you are unassembling. To have this subroutine called each
- time DEBUG writes a character, we insert a subroutine Call:
-
- E 2C8:2C4 E8 3D FE ;Call 0104
-
- This puts a CALL 0104 in place of XCHG DX,AX and MOV AH,02.
- That is why we perform those instructions before returning
- to the display routine. The very next charter printed by DEBUG
- after you Enter the above command will be stored in location
- 2C8:3300 as well as displayed on the screen.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- 7
-
- Immediately after entering the CALL instruction above
- you should begin the Unassemble commands that you determined
- will give you all the code for the program.
-
- U 100 4D5
- U 6b0 799
- etc.
- D 2C8:102 103 ;This displays the pointer to the end of text
- B3 D9 ;This means we filled memory to D9B3
- ;(remember the 8088 stores words LSB first)
- H D9B3 3300 ;Hex arithmetic
- 0CB3 A6B3 ; D9B3 - 3300 = A6B3
- R CX
- CX=1748
- :A6B3 ;load CX register with # of bytes to write
- N FILE.DB ;name the output file
- W 2C8:3300 ;writing from 3300 off. from DEBUG base
- Writing A6B3 bytes
- E 2C8:102 00 33 ;reset pointer if out of space
-
- Remember, you can only write text to memory up to 2C8:-
- FFFF. If you exceed that you will write over DEBUG at 2C8:0000
- and will probably have to re-boot. If FILE.COM is too big
- to Unassemble in one pass you'll have to do it in pieces and
- append them together with your text editor. For this reason
- it is a good idea to modify and save a copy of DEBUG under
- another name such as UDEBUG. If you need to perform any other
- operations with a modified DEBUG that you do not want written
- to memory you can restore DEBUG to normal operation with:
-
- E 2C8:2C4 92 B4 02 ;restores XCHG DX,AX and MOV AH,02
-
- Now text edit FILE.DB and remove any extraneous lines such
- as debug prompts that might have been displayed. If there
- are any TABs in FILE.DB they will confuse UASM-JMP and the
- others. DEBUG 1.1 appears to put a TAB after each instruction
- while version 2.0 does not. I always use the text editor to
- change all TABs to the appropriate number of spaces. (Users
- of PMATE, use the YF command.)
-
- Any of the memory addresses above may vary with your operat-
- ing system and DEBUG version. The values given are for the
- Victor 9000, MS-DOS 1.25a, and DEBUG 1.07. The Base Segment
- where DEBUG is loaded (2C8 above) will depend upon your machine
- and operating system, and is found by using DEBUG to Search
- for itself in memory. The display subroutine (2C0 above) de-
- pends upon your DEBUG version number. The same subroutine
- occurs at 2B5 in the DEBUG that comes with PC-DOS 1.10, and
- will appear near these locations in any other version 1 DEBUGs.
- If you store the capture subroutine at some other place in
- memory you need to change the two [0102] references and the
- CALL 0104 instruction.
-
-
-
-
-
-
-
-
-
-
-
-
- 8
-
- UASM-JMP Instructions
-
- Run UASM-JMP as you would any basic program. It will
- prompt you for the name of input and output files. Respond
- with FILE.DB ,which we created above, and B:FILE.JMP for out-
- put. If file extensions are not provided, .DB and .JMP will
- be assumed for input and output respectively. Also the output
- file name will default to the input file name. I highly recom-
- mend putting these files on separate drives if you don't have
- a fixed disk or a RAM disk. This will speed up the program
- and save wear on your floppies.
-
- UASM-JMP will make two passes through the input file.
- On the first pass it will build a list of all referenced lines.
- It then sorts this list (shell sort), eliminates duplicate
- references, and on the second pass, labels all of the referenc-
- es. The output will be displayed on your screen as well as
- written out on the second pass.
-
- If the program finds a Jump or CALL to an address not
- contained in the file you will get the message "WARNING! No
- code for this label". This most likely means you missed the
- block of code starting at address hhhh and will have to add
- it to FILE.DB. The statement after an unconditional program
- transfer (JMP or RET) is always labeled. The message "WARNING!
- This label not referenced" means that there is no Jump or CALL
- to this label. It might be an interrupt handler or, in a highly
- modified program, it might be code left over from an earlier
- version which is no longer executed. (NOP instructions are
- not force labeled, but the following instruction is.) A large
- number of these errors might indicate that they are accessed
- by an address table. Both of the above errors might occur
- if you miss a block of code, unassemble a data area, or the
- code modifies itself.
-
- For added readability, UASM-JMP inserts one blank line
- after each JMP or JMPS instruction and three lines after a
- RET or IRET. This helps separate Proceedures.
-
-
-
- UASM-INT Instructions
-
- To run UASM-INT you must also have the data file UASM-
- DOS.MAC on one of the drives. UASM-INT will prompt you for
- an input and output file names. If extensions are not provided,
- .JMP and .INT will be assumed for input and output respective-
- ly. The program then loads the symbol table contained in UASM-
- DOS.MAC. While reading through FILE.JMP, whenever UASM-INT
- encounters an INT instruction it adds a Macro call, Symbols
- for the DOS function calls, and Comments, all from the UASM-
- DOS.MAC file. These lines will also be displayed on the screen
- as the program progresses. Note that the DOSCALL Macro is
- inserted in the text, but the INT instructions are not deleted.
- After you have checked the code you must delete the INT and
- any MOV instructions that will be duplicated by the Macro.
-
-
-
-
-
-
-
-
- 9
-
- UASM-STR Instructions
-
- To run UASM-STR you must have the original FILE.COM or
- other binary file on disk. The program will prompt you for
- the input, output, and binary file names. These will default
- to .INT, .STR and .COM if no other extension is given. As
- usual, the input file name will be used as a default if you
- do not specify the others, and you should put the output file
- on a different floppy drive than the input file.
-
- You will then be prompted for any string area addresses
- that you may have found while examining FILE.COM with DEBUG.
- You may enter an address range (hhhh kkkk), the address of
- a single string (hhhh), or an address and a length (hhhh Ln)
- on each line. (Up to 20 lines). Upon receiving a blank line
- as input, the program will find all strings terminated with
- a $ starting at the first address in a range and continue find-
- ing multiple strings to the second address if present. If
- a single address is given on a line a single string will be
- read. If a length is provided, the string will be truncated
- to that length or at the terminator, which ever comes first.
- (This is useful for data string which do not have $ terminat-
- ors.) Each string is displayed as it is found.
-
- Following this the program reads through FILE.INT. For
- each "DOSCALL PRINT$ hhhh" encountered it reads the string
- from FILE.COM at the specified location (taking into account
- the 100H byte program prefix) and prints that string as a com-
- ment next to the Macro. Also, each time a register is loaded
- with the address of a string, that string is shown next to
- the code. At the end of the file, UASM-INT will append a number
- of EQUates and Data statements and define the string variables
- with names Dhhhh. Non-printing characters are converted into
- hex bytes. CR, LF, TAB, ESC, and $ are defined as symbols.
-
- DOSCALLs that do file I/O and that load the address of
- the File Control Block into DX will generate that FCB as a
- string. Any address which is used within brackets (e.g.
- LEA DX,[hhhh]), that is not already a known string address,
- is assumed to be the address of a variable. A data statement
- is generated for the variable, and two bytes are extracted
- from FILE.COM to show its initial value.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- 10
-
- SAMPLE OUTPUT - Excerpts from DEBUG.STR
-
- INCLUDE UASM-DOS.MAC
- .RADIX 16
-
- START: JMPS L011D
-
- L011D: MOV SP,1822
- MOV [1897],AL
- MOV DX,0102
- MOV AH,09
- INT 21
-
- DOSCALL PRINT$,D0102 ;CR,LF,'DEBUG-86 version 1.07',CR,LF,$
- MOV AX,2522
- MOV DX,01E6
- INT 21
-
- DOSCALL SET$INT 01E6 ; Set interrupt vector (AL=INT, DS:DX=VECTOR)
-
- MOV AL,23
- MOV DX,01EB
- INT 21
-
- DOSCALL SET$INT 01EB ; Set interrupt vector (AL=INT, DS:DX=VECTOR)
-
- MOV DX,CS
- ADD DX,01AB
- MOV AH,26
- INT 21
-
- DOSCALL BUILD$PS 01AB ; Create new program segment (DX=SEGMENT)
-
-
- MOV AX,DX
- MOV DI,1832
- STOSW
- MOV DX,0080
- MOV AH,1A
- INT 21
-
- DOSCALL SET$DTA 0080 ; Set Disk Transfer Address to DX
-
- MOV AX,[0006]
- MOV BX,AX
- CMP AX,FFF0
- PUSH CS
- POP DS
- ADD [0008],BX
- MOV DI,005C
- MOV SI,0081
- MOV AX,2901
-
-
-
-
-
-
-
-
-
-
-
-
- 11
-
- INT 21
-
- DOSCALL PARSE$ ; Parse Filespec (SI -> LINE, DI -> FCB, AL=CODE)
-
- CALL L0917
- PUSH CS
- POP ES
- CMP B,[005D],20
- JZ L01B5
- JMPS L01B5
-
- L01E3: JMP L04CB
-
- L01E6: MOV DX,167A ;WARNING! This label not referenced
- MOV DS,AX
- MOV SS,AX
- MOV SP,1822
- MOV AH,09
- INT 21
-
- DOSCALL PRINT$ ; Display string @DX till terminator
-
- JMPS L01B5
-
- L01FD: MOV AH,0A
- MOV DX,1844
- INT 21
-
- DOSCALL INSTR$ 1844 ; Input keyboard string (DX -> size,cnt,buffer)
-
- MOV SI,1846
- ;END CODE
- .RADIX 16
- CR EQU 0D
- LF EQU 0A
- TAB EQU 09
- ESC EQU 1B
- $ EQU 24
- D167A DB CR,LF,'Program terminated normally',CR,LF,$
- D169A DB 'Invalid drive or file name',CR,LF,$
- D16B7 DB 'File not found',CR,LF,$
- D16C8 DB 'No room in disk directory',CR,LF,$
- D16E4 DB 'Insufficient space on disk',CR,LF,$
- D1701 DB 'Disk$'
- D1706 DB 'Write protect$'
- D1714 DB ' error reading drive A',CR,LF,$
- D172D DB 'readwritInsufficient memory',CR,LF,$
- D174B DB '^ Error',CR,8A,' ',88,'Error in EXE/HEX file',CR,LF,$
- D176E DB 'EXE/HEX file cannot be written',CR,LF,$
- D178F DB 'Writing $'
- D1798 DB ' bytes',CR,LF,$
- D0102 DB CR,LF,'DEBUG-86 version 1.07',CR,LF,$
-
-
-
-
-
-
-
-
-
-
-
-